Form Extraction
The Form Extraction process removes the form outline and form text from an image so that only the user input remains on the image. This can make other processes such as OmniPage Zone OCR more accurate, since the lines and other form data will not interfere. For a Form Extraction process to function properly, the image being processed must match the form designated for the Form Extraction process. In order to use Form Extraction for a particular process such as OmniPage Zone OCR while keeping the form on the stored image, use Form Extraction as a local enhancement within the other process. In this way, you will still be able to take advantage of it without affecting the final image.
Example: The City of Wonderland Department of Planning scans in building permit applications from their old files, many of which are typed on the forms in such a way that the letters occasionally overlap the lines of the form. They use Form Extraction to remove the lines and words of the form so they can capture the information that was typed into it.
To configure Form Extraction
- In the Session Configuration Pane, select the stage of processing where you want to use Form Extraction.
- In the Tasks Pane, select Form Extraction. Under More Options, you can select Wizard to display more information about each property or Skip Wizard to display the properties all at once without additional information.
- Master Form: The master form is an image of a blank form that matches the forms you will be processing. In Form Extraction, the lines and text of the blank master form will be removed from the forms that are filled in, leaving only the data entered on the form. There are three ways to add a master form:
- Scan: Scan in the master form using the currently configured scan source.
- Import: Import an image that is currently stored in a network drive.
- Copy from a sample image: If you have already added a custom sample image that is the same image you want to use as the master form, you can copy it.
- Page Range: When configuring an image enhancement in Page Processing or Storage Processing, you will be prompted to specify a page range. In other stages, default settings will automatically be applied.
- Excess Cleanup Level: You can speed up processing by eliminating images from consideration by the other Form Identification properties if they fail to meet certain criteria in this step.
- Background Color: Select this option to eliminate images whose background color or bit depth differ from the master form.
- Relative Intensity: Select this option to eliminate images that are much darker than the Master Form (those that contain a significantly greater number of foreground pixels). Do not use this option if you want to identify forms that have been faxed or photocopied.
- Confidence: Select the percent by which an image must correspond to the Master Form to be considered a successful match. Zero means the image is completely different from the Master Form, and 100 means they are exactly the same.
- Low (60%)
- Medium (75%)
- High (90%)
- Specify a level of confidence: Specify a value from 0 to 100.
- Cleanup Level: This option helps you compensate for forms with blurry or fragmented lines and shapes by expanding the region within which Quick Fields will look for elements that correspond to the master form.
Example: Selecting Medium (2) removes all stray marks located within two pixels from a line
- Character Reconstruction: Characters that intersect lines may be damaged during line removal. The Character Reconstruction option repairs any damaged characters. Specify the scope of character reconstruction. The numbers represent the maximum gap, in pixels, of a character disruption to be reconstructed.
Tip: If damaged characters are not sufficiently repaired during processing or testing, select a higher value. If characters are improperly repaired, select a lower value.
- Optional: To preview how this enhancement will affect scanned images and OCRed or extracted text, test processes. For the best results, add a custom sample image before testing. Adjust and test until you are satisfied with the results.